RATE ANALYSIS OF INEXACT DUAL FIRST ORDER METHODS: 
APPLICATION TO DISTRIBUTED MPC FOR NETWORK SYSTEMS 
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Abstract. In this paper we propose and analyze two dual methods based on inexact gradient 
information and averaging that generate approximate primal solutions for smooth convex optimiza- 
tion problems. The complicating constraints are moved into the cost using the Lagrange multipliers. 
The dual problem is solved by inexact first order methods based on approximate gradients and we 
prove sublinear rate of convergence for these methods. In particular, we provide, for the first time, 
estimates on the primal feasibility violation and primal and dual suboptimality of the generated 
approximate primal and dual solutions. Moreover, we solve approximately the inner problems with 
a parallel coordinate descent algorithm and we show that it has linear convergence rate. In our 
analysis we rely on the Lipschitz property of the dual function and inexact dual gradients. Further, 
we apply these methods to distributed model predictive control for network systems. By tightening 
the complicating constraints we are also able to ensure the primal feasibility of the approximate 
solutions generated by the proposed algorithms. We obtain a distributed control strategy that has 
the following features: state and input constraints are satisfied, stability of the plant is guaranteed, 
whilst the number of iterations for the suboptimal solution can be precisely determined. 

Key words. Inexact dual gradient algorithms, parallel coordinate descent algorithm, rate of 
convergence, dual decomposition, estimates on suboptimality and infeasibility, distributed model 
predictive control. 

1. Introduction. Different problems from control and estimation can be ad- 
dressed within the framework of network systems p2]. In particular, model predictive 
control (MPC) has become a popular advanced control technology implemented in net- 
work systems due to its ability to handle hard input and state constraints. Network 
systems are complex and large in dimension, whose structure may be hierarchical, 
multistage or dynamical and they have multiple decision-makers. Such systems can 
be broken down into smaller, more malleable subsystems called decompositions. How 
to consider the relationships between these various decompositions has led to much 
of the recent work within the general subject of the study of network systems. 

Decomposition methods represent a powerful tool for solving distributed control, 
estimation and other engineering problems. The basic idea of these methods is to 
decompose the original large optimization problem into smaller subproblems which 
are then coordinated by a master problem. Decomposition methods can be divided 
into two main classes: primal and dual decomposition methods. In primal decompo- 
sition the optimization problem is solved using the original formulation and variables, 
while the complicating constraints are handled via methods such as interior point, 
penalty functions, feasible directions, Jacobi [ll[71[Tni[I71[5S] . In dual decomposition 
the original problem is rewritten using Lagrangian relaxation and then solve the dual 
problem {InSji^l^- When the original problem is characterized by both simple and 
complicating constraints, dual decomposition may represent an appropriate choice 
since the complicating constraints can be moved into the cost using Lagrange mul- 
tipliers and then the inner problems, that have simple constraints, are solved and 
the dual variables are updated with a Newton or (sub)gradient algorithm. Dual fast 
gradient methods based on exact first order information with provable guarantees on 
suboptimality are given in [18] for general convex problems and |23j for QP's. Dual 



*The authors are with Automation and Systems Engineering Department, University Politehnica 
Bucharest, 060042 Bucharest, Romania. Corresponding author: I. Necoara, Tel. -1-40-21-4029195, 
Fax +40-21-4029195, Email ion. necoaraQacse. pub.ro. 

1 



2 



I. Necoara, V. Nedelcu 



methods based on subgradient iteration and averaging, that produce primal solutions 
in the limit, can be found e.g. in [TTKT^IHZ] • Converge rate analysis for the dual sub- 
gradient method has been studied e.g. in |19| . where the authors provide estimates 
of order 0{1/Vk) for suboptimality and feasibility violation of the approximate so- 
lutions. Thus, an important drawback of the dual methods is that feasibility of the 
primal variables can be ensured only at optimality, which is usually impossible to 
attain in practice. However, in many applications, e.g. from control and estima- 
tion, the constraints can represent different requirements on physical limitation of 
actuators, safety limits and operating conditions of the controlled plant. Neglecting 
these constraints can reduce economic profit and cause damage to the environment 
or equipments. Therefore, any control or estimation scheme must ensure feasibility. 
Further, there is no convergence rate analysis in any of the existing literature for in- 
exact dual (fast) gradient schemes. Thus, our goal is to develop inexact dual gradient 
algorithms which provide approximate primal solutions that are suboptimal and close 
to feasibility. 

There are many ways to ensure feasibility of the primal variables in distributed 
MPC, e.g. through constraint tightening [3J|S1[TH[21] or distributed implementations 
of some classical methods such as the method of feasible directions, penalty functions, 
Jacobi and others [51[71 fTUl[T51[^ . In [5], a dual distributed algorithm for solving the 
MPC problem for systems with coupled dynamics and constraints is presented. The 
algorithm generates a primal feasible solution using primal averaging and constraint 
tightening. The Jacobi algorithm from 2 is used to update the primal variables, while 
the dual variables are updated using the subgradient method in jl9]. The authors 
prove the convergence of the algorithm using the analysis of the dual subgradient 
method from |19) which has very slow convergence rate. In [T^], the authors propose a 
decentralized MPC algorithm that uses the constraint tightening technique to achieve 
robustness while guaranteeing robust feasibility of the entire system. In [l0l[24] . 
distributed MPC algorithms for systems with coupled constraints is discussed. The 
approach divides the single large planning optimization into smaller subproblems, 
each planning only for the controls of a particular subsystem. Relevant plan data is 
communicated between subproblems to ensure that all decisions satisfy the coupled 
constraints. In [TBIHI] cooperative based distributed MPC algorithms are proposed 
that converge to the centralized solution. In TF a distributed MPC algorithm is 
proposed based on agent negotiation. In [4|F distributed algorithms based on interior 
point or feasible directions are proposed that also converge to the centralized solution 
and guarantees primal feasibility. An iterative distributed model predictive control of 
large-scale nonlinear systems subject to asynchronous and delayed state feedback is 
discussed in [M] . See also [6j[T7l[25] for recent surveys of distributed and hierarchical 
MPC methods. While most of the work cited above focuses on a primal approach, 
our work develops for the first time efficient dual methods that ensure constraint 
feasibility, tackles more general problems and more complex constraints and provides 
much better estimates on suboptimality. 

Contribution. The contributions of the paper are as follows: 

1. We propose and analyze novel dual algorithms with low complexity and 
fast rate of convergence that generate approximate primal solutions for large 
smooth convex problems. 

2. We introduce a general framework for inexact first order information and then 
propose two inexact gradient methods for solving the dual (outer) problem: 

• an inexact dual gradient method, with rate of convergence of order 
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0{l/k). 

• an inexact dual fast gradient method, with convergence rate of order 

0(l/fc2). 

3. For both methods we provide for the first time a complete rate analysis and 
estimates on primal/dual suboptimality and feasibility violation of the gen- 
erated approximate solutions. 

4. In our schemes we solve the inner problems only up to a certain accuracy ein 
by means of a parallel coordinate descent method for which we prove linear 
rate of convergence. 

5. For convex optimization models arising from distributed MPC problems, we 
adapt our algorithms using a tightening constraints approach, such that the 
convergence rates of the methods are preserved but in addition we are also 
able to ensure the primal feasibility. 

6. To certify the complexity of the proposed methods, we apply the new al- 
gorithms on several linear distributed MPC problems with state and input 
constraints. 

Paper outline. The paper is organized as follows. In Section [2] we introduce 
the dual problem of our original optimization problem formulated in Section 11.11 
In Sections 12.21 and 12.31 we develop inexact dual gradient and fast gradient schemes 
for solving the outer problem and analyze their convergence rates. In Section [3] we 
propose a parallel coordinate descent method for solving the inner problems and prove 
its convergence rate. In Section |3] we first show how the distributed MPC problem 
for a network system can be recast in the form of our optimization model. Then, we 
combine the new dual algorithms with constraint tightening in order to ensure primal 
feasibility and stability. Finally, in Section[5]we provide extensive simulations in order 
to certify the efliciency of the newly developed algorithms. 

Notation: We work in the space R" composed by column vectors. For u, v g M" 
we denote the standard Euclidean inner product (u,v) = Y^^=i^i^i^ norm ||u|| = 
(u, u) and projection onto non-negative orthant M" as [u]^. We use (•, •), ||-|| and 
[•]^ for spaces of different dimension. For a real number a, [aj denotes the largest 
integer which is less than or equal to a. For any e e [0,1] we say that a quantity q is 
of order 0{p{e)) if there exists c > such that q < cp(e). Further, for a convex set 
U, relint(U) denotes the relative interior and D\j its diameter D\j — max jju— v||. 

u.vGU 

For a matrix G £ Rp^", ||G|| and |1G||f denote the 2-norm and Frobenius norm, 
respectively. 

1.1. Problem formulation. We are interested in solving the following large- 
scale smooth convex optimization problem: 

(1.1) i^* = min {F(u) : h{u) < 0} , 

u(ETJ 

where F : M" M and the components of h : M" — > W are convex functions, and 
U C M" is a compact, convex set. Further, we assume that F and the components 
of h are twice differentiable. We also assume that the projection on the set defined 
by the complicating constraints (called also coupling constraints) h{u) < is hard to 
compute, but the set U is simple, i.e. the projection on this set can be computed very 
efficiently (e.g. hyperbox, Euclidean ball, etc). 

In this paper we consider the following assumptions: 

Assumption 1.1. (i) Function F is ap-strongly convex w.r.t. \\ ■ \\ (see J 2 11 
Definition 2.1.2]). 
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(ii) The Jacobian of h is bounded on U, i.e. there exists a constant Ch > such 
that: 

\\Vhiu)\\F<Ch VueU. 



Assumption 1.2. Slater condition holds for (jl.ip . i.e. exists u S relint{U) with 
h{u) < 0. 

Note that as a consequence of Assumption (|1.2p . we have that strong duahty 
holds for (ILTI) . 

2. Solving the dual problem using inexact first order methods. Our 

goal is to solve the optimization problem using dual gradient based methods. 

In order to update the dual variables we use inexact dual gradient methods (Sections 
12.21 and 12. 3p . while the inner problems are solved up to a certain accuracy by means 
of a parallel coordinate descent algorithm (Section [3]). An important feature of our 
algorithms consists of the fact that even if we use the inexact gradient of the dual 
function, after a certain number /cout of outer iterations, we are still able to compute a 
sequence of primal variables u'^°"t which are eout-optimal and their feasibility violation 
is also less than O{eout), i-e.: 

(2.1) u^-°-eU, ||[/i(fi'=°-)] + ll <0(eout) and -0{e,^t) < F{u''°-^)^ F* < 0{e,ut). 

2.1. A framew^ork for inexact first order information. We assume that 
the projection on U is simple but the projection on the set defined by the coupling 
constraints h(u) < is hard to compute. Therefore, we move the complicating con- 
straints into the cost via Lagrange multipliers and define the dual function: 

(2.2) d(A) = min/:(u, A), 

utEU 

where C{u, A) = ^(u) + (A, h{u.)) denotes the partial Lagrangian w.r.t. the compli- 
cating constraints h{u) < 0. We also denote by u(A) an optimal solution of the inner 
problem: 

(2.3) u(A) e argmin/:(u, A). 



Based on Assumption 11.11 the gradient of the dual function d{X) is given by 2, Ap- 
pendix A]: 

Vd(A) = h{u{\)). 

The following lemma gives a characterization of the Lipschitz property for the gradient 
Vd(A): 

Lemma 2.1 (see Appendix). Let the function F and the components of h be 
twice differentiable and Assumvtion \l.l\ holds. Then, the gradient Vd{X) is Lipschitz 
continuous with constant: 

l^d — — ■ 

Under strong duality (see Assumption II. 2p we have for the outer problem: 
(2.4) F*=ma^di\), 

A>0 
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for which we denote an optimal solution by A*. Since we cannot usually solve the 
inner optimization problem (|2.3p exactly, but with some inner accuracy obtaining an 
approximate optimal solution u(A), we have to use inexact gradients and approximate 
values of the dual function d. Thus, we introduce the following two notions: 

J(A) = C{u{X), A) and W(A) = h{u{X)). 

If we assume that u(A) is computed such that the following inner Cin-optimality holds: 

(2.5) u(A)eU, C{u{X),X)-C{u{X),X)<'f, 

then the next lemma provides bounds for the dual function d{X) in terms of a linear 
and a quadratic model which use only approximate information of the dual function 
and of its gradient. 

Lemma 2.2. J3 Section 3.2] Let Assumvtions [Ll\ and \J7B hold and for a given X 
let u(A) be computed such that (j2.5p is satisfied. Then, the following inequalities are 
valid: 

(2.6) 0>d(Ai)-[J(A) + (W(A),M-A)]>-Ld||Ai-A||2-e„ V/i e . 



Proof. For linear functions h, this lemma is proved in ^ Section 3.2] with stopping 
criterion ein/2 in (|2.5p . For general convex functions h satisfying Assumption ll.il (ii) 
we can easily show that ||/i(u) — /i(v)|| < \/2ch||u — v|| and then following exactlty 
the same steps as in [9] we get the result in p.6p . □ 

Remark 2.3 Relation p.Sp represents the stopping criterion for solving the inner 
problem ()2.3p . Many optimization methods offer direct control of this criterion (see 
e.g. the method of Section|31). For affine functions h (see e.g. MPC problems in Section 
H]), the stopping criterion in (|2.5p can be taken as [9]: u(A) G U and £(u(A),A) — 
£(u(A),A)<^. 

2.2. Inexact dual gradient method for solving the dual (outer) problem. 

In this section we analyze the convergence properties of an inexact dual projected 
gradient algorithm for solving approximately the dual problem ()2.4p . Let {a"' }j>Q be 

a sequence of positive numbers and S'^ = j=o ■ consider the following inexact 
dual gradient algorithm: 



Algorithm (IDG)(A°) 

Given A'' G M.^, for fc > compute: 

1 . u'' « arg min C{u,X'') such that holds 



u6U 

2. A'^+i - [A'= 



^Vd(A'=)] 



Recall that inexact gradient Vd(A'^) = h{u'^) and a'^ e 



1 1 

2L' 2Ld 



IS a given 



step size with L > L^. The following theorem provides an estimate on the dual 
suboptimality for algorithm (IDG): 

Theorem 2.4. Let Assumptions \l.l\ and \l.S\ hold and the sequences (u*^, A'^)^,^^ 
be generated by algorithm (IDG) and define the average sequence of dual variables 
A*^ = -^^^j^qOi^ X^~^^ . Then, the following estimate on dual suboptimality can be 
derived for dual problem (|2.4p ; 



(2^7) 



F--.(A',<gi 
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where we define: 



i?d = ||A* -A"| 



Proof. Let us first notice that tlie update of the dual variables can be equiva- 
Icntly written as A'^+^ = argmin ||A — A''!!^ — (Vd(A'^), A — A*^) 

optimality condition reads: 



for which the 



(2.8) 



l^yk+i _ _ Q,fevd(A'=), A - A'^+1) > VA > 0. 



If we now define r;^ = jjA-' — A|p for any A > 0, then we have: 



lA^ 



+1 



A^' + A^' - A|p = r{ + 2(A^+i - A^ A^' - \^+^ + A^'+i - A) + \\X^+^ - \^\ 



r{ + 2(AJ'+i - A^ A^+i - A) - \\X^+^ - A^ ||^ 



(2.9) 



? r{ - 2a^ (Vd(A^), A - A^') + 2a^ [(Vd(A^'), A^+i-A^)-Ld||A^+^-A^ |p] 

< r{ + 2a^ [d{X^) - d{\)] + 2a^ [d(A^+i) - d{X^) + €;„] 
= r{ + 2a^[d{X^+^) - d(A) + ein] VA > 0, 

where in the first inequality we use the fact that < jIT^- Summing up these 
inequalities for j — 0, . . . , k and using the definition of A'' we can write: 



25* 



d(A) - d(A'^) 



23" 



VA > 0. 



Letting now A = A*, dividing both sides of the previous inequality by 25''^ and taking 
into account that S'' > ^ we obtain ^^77} . □ 



We can observe that the first term in the estimate ()2.7p represents the standard rate 
of convergence of the gradient method for the class of smooth functions [21]. Also, 
the second term ein is the error induced by the fact that the gradient is computed 
only approximately and shows that algorithm (IDG) does not accumulate errors. 

However, we are now interested in finding estimates for primal feasibility violation 
for original problem (jl.ip . Let us introduce the following average primal sequence: 



(2.10) 



1 



3=0 



The following theorem provides an estimate on primal feasibility violation for problem 

Theorem 2.5. Under the assumptions of Theorem \2.4\ and with u'^ defined in 
()2.10p . the following estimate on primal feasibility violation can be derived for the 
original problem (jl.ip ; 



(2.11) 

where v{k, ein) 



\\[h{u'')] + \\<vik,e,„) Vfc>0, 

4LJ?,d , 6£||A"|| 



fe+1 



2a/ j,_|_iem- 
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Proof. Using the definition of A^^^ we liave that the following component-wise 
inequalities hold: A-' + a^Wd{X^) < A-'^^ for all j > 0. Summing up these in- 
equalities for j = 0, . . . , fc and taking into account that Wd{X^) — h{u^) we obtain: 
^^^pa^7i(u^) < A''^^ — A*^ < A'^+^, which together with the convexity of h gives: 

Hu'') < Since A''+^ > we also have that < [/i(u'')]_^ < and thus we 

can further write: 

II A'^+Ml 

(2.12) II [Ma')], II < 



Thus, in order to find an estimate on primal feasibility violation, we have to upper 
bound the norm of the dual sequence A*^"''"'^ . For this purpose we can use (j2.9p with 
A = A*: 

||A^'+i - X*f < \\X^ -X*f + 2a= [d(A-'+i) - d{X*) + . 

Summing up these inequalities for j = 0, . . . , fc, using (A", A*) > and d{X'^^) < 
d{X*), we get: 

pfe+i _ A*||2 < ||A*||2 + ||A"||2 + 25*^ein, 

Now, using the Cauchy-Schwartz inequality we get the second order inequality in 
||A^+i||: 

IIA'^+ill^ - 2||A*||||A'=+1|| - ||A°||2 - 25'^ein < 0. 

Therefore, || A'^"'""'^ || must be less than the largest root of the corresponding second-order 
equation: 

||,.«|| ^ 2||A-H[4||A-f + 4||A°f + 85>.„]-^' ^ ^H^.^ ^ ^ ^ 

<2||A--A"||+3||A°|| + v'2SS;;:, 

where in the second inequality we used that \/Ci~-F^ < VCi + VC^- Introducing this 
inequality in (|2.12p and taking into account that S'' > we obtain (|2.1ip . □ 



Theorem 2.6. Let the assumptions of Theorem \ 2.5\ hold. Then, the following 
estimates on primal suboptimality can be derived for the original problem (jl.ip ; 

(2.13) - (i?d + ||A°||) vik, e,„) < F(u'^) - F* < + e,„. 

Proof. In order to prove the left-hand side inequality we can write: 

F* = d{X*) = minF(u) + {X*,h{u)) < F(u'=) + {X*,h{u'')) 

< Fin'-) + {X*,[hiu')] + ) < Fiu'^) + ||A*||||[Ma'=)] + || 

= F(u'=) + ||A* - A" + AO||||[;i(u'=)] + || < F(u'=) + (i?d + ||AO||)||[/i(u'=)] + ||, 

which together with (|2.1ip lead to the result. 
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Now, we prove the right-hand side inequality. Taking A = in the first inequality of 
(ESD we get: 

\\X^+^f -2a'{Vd{X^),X^) < \\Xm^ + 2a' [{Vd{X^),X^+^-X^)-Ld\\X'+^~Xm^] 



< 



\\X^\\^ + 2a^ [rf(A^+i) -d(AJ) + , 



Taking into account that Vd(A-') = /i(u^) and using the definition of d{X^) we have: 
— (Vd(A^ ), A-') ~ F(u^) — d{X^). Using this relation in the previous inequality we get: 

||A^+i||2 + 2ai [F(u^) - J(A^)] < \\X^\'^ + 2a^ [d(A^+i) - d{X^)] + 2a^ei„. 

Summing up these inequalities for j ~ 0, . . . , fc and taking into account that F is 
convex and d concave, we obtain the following inequality: 



F(v^)-d{V') < ||A°|p + 25^ei, 



Dividing both sides of the previous inequality by S'' and using that S'^ > and 
d(A'=) < F*, we obtain dllTSl). □ 



Now, for a desired accuracy eout for solving problem (jl.ip . we are interested in 
finding the number of outer iterations fcout and a relation between eout and ein such 
that primal feasibility violation and suboptimality satisfy (|2.ip and, moreover, the 
dual suboptimality will be also less than ©(eout)- For simplicity, we consider the 
initial iterate A° = and thus i?d = ||A*||. Further, we consider a constant step size 
— Using Theorems 12.41 [^31 and I^TBl we can take: 



Eout 



and ei, 



€out ; 



for which we obtain the following estimates for primal feasibility violation and sub- 
optimality: 



2eout < ^^(u*--°-) - < eout and F* - d(A'==-)< T^out 



From the previous discussion it follows that in the algorithm (IDG) the inner prob- 
lems p.3[) need to be solved with about the same accuracy as the desired accuracy of 
the outer problem, i.e. ei„ = eout in the stopping criterion ()2.5p . 

2.3. Inexact dual fast gradient method for solving the dual (outer) 
problem. In this section we discuss an inexact dual fast gradient scheme for updating 
the dual variable A. A similar algorithm was proposed by Nesterov in j22| and applied 
further in [18 for solving dual problems with exact gradient information. An inexact 
version of the algorithm can be also found in The scheme defines two sequences 

( A'^, A'^ ) for the dual variables: 

V / A;>0 
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Algorithm (IDFG)(A0) 

Given A" e M^, for fc > compute: 

1 . u*^ « arg mill £(u, A'=) such that ^li) holds 



2. A* 



+ 2r7Vrf(A'=) 



o \fe+l _ k+l\k J 2 

— fc+3 



A° + 5rTELo^Vd(A^) 



where we recaU that Vd(A'^) = /i(u'^). Based on Theorem 4 in [9], which is an 
extension of the results in to the inexact case, we have the following result 

which will help us to establish upper bounds on primal and dual suboptimality and 
feasibility violation for our method. 

Lemma 2.7. Theorem 4] If Assumvtions \l.l\ and \1.2\ hold and the sequences 

u'^, A*^, A*^ J are generated by algorithm (IDFG), then for all k > we have: 

I k>0 



{k + l)(fc + 2) ^ niax-LrfllA - A°|p + ^ i±l [J(A^) + (Vd(A^), A - A^)] 



A>0 



(2.14) 



s=0 

(fc + l)(fc + 2)(fc + 3) 
12 ' 



VAe 



The following theorem provides an estimate on the dual suboptimality for algo- 
rithm (IDFG): 

Theorem 2.8. Let A ssumvtions l l.l\ and \l.S\ hold and the sequences ^u''', A*^, A''^ 

be generated by algorithm (IDFG). Then, an estimate on dual suboptimality for (|2.4 
is given by: 



k>0 



F* - dCx') < + {k + l)e.^, 



(2.15) 



with i?d defined as in Theorem \2.4\ 

Proof. Using the first inequality from (j2.6p in (|2.14p we get: 



ik + l)ik + 2)^^^. 



s=0 



12 



Dividing now both sides by ^ rearranging the terms and taking into account 

that d{X*) = F*, (fc + l)2 < (fc + l)(fc + 2) and (fc + 3)/3 < fc + 1 we obtain ^JQ. □ 



We can observe that the first term in the estimate (|2.15l) represents the standard 
rate of convergence of the fast gradient method for the class of smooth functions [21] . 
Also, the second term (fc + l)ein is the error induced by the fact that the gradient 
is computed only approximately and shows that algorithm (IDFG) accumulates the 
errors. 

Further, we are interested now in finding estimates on primal feasibility violation and 
primal suboptimality for our original problem (|l.ll) . For this purpose we define the 
following average sequence for the primal variables: 



(2.16) 



V 2(. + l) 
£^(fc + l)(fc + 2) 
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The next result gives an estimate on primal feasibility violation. 

Theorem 2.9. Under the assumptions of Theorem \2.8\ and vl^ generated by 
(|2.16p . an estimate on primal feasibility violation for original problem (II. ip is given 
by: 

(2.17) ||[/^(u^)] + || <«(fc,e„), 



where v{k, e„) = + + 4^^e,„. 

Proof. Using (|2.14p . the convexity of F and h and taking into account that 
(A: + 3)/3 < fc + 1, we can write for any A G R^: 

(2.18) max-^^^^^||A- A"||2 + (A,/^(u'=)) < (A: + l)ei„ + rf(A'^) - F(u'=). 

For the second term of the right-hand side we have: 

d{X'') - F(u'=) < d{X*) - F(u'=) = minF(u) + (A*,/i(u)) - F{u'') 

(2.19) < F(u'=) + {X*,h{u')) - F{u') = {X*,h{u'^)) < (A*, [h{u'^)] + ), 

where in the last inequality we used that A* > 0. By evaluating the left-hand side 
term in at A = -^^I^ and taking into account that ([ft.(u'=)]+, /i(u'=) - 

[ft.(u'^)] + ) = we obtain the following inequality: 



„, ,0,12 , /X w,-,fe^^ + 



2 



k\^ ||2 



(2.20) _iM^ + (A",[/^(u'^)]^). 

Combining now (j2.19p and (j2.20p with ()2.18p . using the Cauchy-Schwartz inequality 
and introducing the notation a = || [/i(u'^)]^ ||, we obtain the following second order 
inequality in a: 

^eZT" ""^ -A ||a-(fc + i).i„-^^^<o. 

Therefore, a must be less than the largest root of the second-order equation, from 
which together with the definition of i?d and the identity \/Ci + C2 < VCi + '^^ 
get the result. □ 



Theorem 2.10. Assume that the conditions in Theorem \2.9\ are satisfied and 
let VL^ be given by (|2.16l) . Then, the following estimate on primal suboptimality for 
problem (II. ip can be derived: 

(2.21) - + |1A0||) vik, £,„) < F(u^-) ~F* < ^Tpf^ + (fc + l)e». 



Proof. The left-hand side inequality can be derived similarly as in the previous 
section (see the proof of Theorem l2.6l) . In order to prove the right-hand side inequality 
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we use dsn]): 

F(u'=)-d(A'=)<-max-^^l^||A-A"|r + (A,Mfl')> + ^ 

A=o 4£d||A°|p k + 3 
- (fc + l)2 + 3 

Taking now into account that c?(A'^) < F* and (fc + 3)/3 < k + 1 we get the result. □ 



Similar to the previous section, assume that we fix the outer accuracy to a desired 
value eout- We are interested in finding the number of outer iterations fcout and a 
relation between eout and ein such that primal feasibility violation and suboptimality 
satisfy (|2.ip . For simplicity, we again consider A° = and thus — ||A*||. Using 
now Theorems 12.81 [2791 and [2TT0l we can take: 



^out — 

for which we obtain: 



Cout 



and 



-6eout < F(u*'°-) -F* < 2eout and F* - d(A'=°-) < Scout- 



Note that for these choices of fcout and Cin the inner problems (j2.3l) have to be 
solved with an accuracy of order O (cout V^out) , i-e. ein = eoutv^eout/(2i?dV'id) in 
()2.5p . We can conclude that the (IDFG) method is more sensitive than the (IDG) 
method due to the error accumulation. 

Remark 2.11 (i) Since in practice we usually cannot compute exactly the value 
— 11^* II 5 we can use instead the following upper bound ,19, Lemma 1]: 

F(u) - dCX) 

(2.22) i?d < 7^d = ^ ' ^ ' 



mini<j<p {-/iJ(u)}' 

where u denotes a Slater vector for problem p.ip (see Assumption 11.21) and A € 
M!J. . The effects of this choice on the overall performance of the new algorithms are 
discussed in Section [HT] 

{ii) The results presented in Sections 12.21 and 12.31 also hold in the case when we solve 
the inner problems exactly, i.e. Cin = in (|2.5p . or when U = R", i.e. the inner 
problems are unconstrained. 

(Hi) Note that if A'^ = and we solve the inner problems exactly, i.e. Cin = 0, then we 
have F(u''°"') < F* , i.e. we are always below the optimal value in algorithms (IDG) 
and (IDFG). 

3. Solving the inner problem using a parallel coordinate descent method. 

In this section we propose a block-coordinate descent based algorithm which permits 
to solve in parallel, for a fixed A*^, the inner optimization problem (j2.3p : 

(3.1) u'' = argmin£(u. A*"), 

We consider for the variable u the partition u — [u[ . . . u^^] ^ and the constraints 
set U can be represented in the form of a Cartesian product U = Ui x • • • x \Jm, 
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with Ui e Ui C R"* being simple sets, i.e. the projection on these sets can be 
computed very efficiently. We also define the following partition of the identity matrix: 
I =[Ei... Em] e R"""", where E, e R"^"- for ah i = 1, . . . , M, n = Y.f=i Thus, 
u can be represented as: u — X)f=i ^i'^i- 

Since for each outer iteration the dual variable A*^ is fixed, for the simplicity of 
the exposition we will drop the second argument of £, i.e. we will use the notation 
£fc(u) = £(u. A''). We will also denote by £^ = >Cfe(u'') the optimal value of (|3.ip . 
We define the partial gradient of Ck at u, denoted Vi£fc(u) e R"', as Vi£fe(u) = 
EjVCk{vi) for alH = 1, . . . , M. 

We consider the following assumption on the gradient of Ck '■ 
Assumption 3.1. The gradient of Ck is coordinatewise Lipschitz continuous with 
constants Li > 0, i.e. for all i — 1, . . . , A/; 

\\V,Ck{^x + Ed^)-V,Ck{n)\\ < U Vue R",d, G R"-. 

We recall that £fc(u) = _F(u) + (Afe, /i(u)) . Assumption l3.1l is valid for example if F 
has coordinatewise Lipschitz continuous gradient and the components of h are linear or 
convex quadratic functions. Note also that coordinatewise Lipschitz continuity also 
implies global Lipschitz continuity on extended space R", with Lipschitz constant 

M 

^ Li. Further, based on Assumption 1 1.1) since F is crp-strongly convex we have that 
i=i 

Ck is also strongly convex (with a parameter ac) w.r.t. the Euclidean norm. We also 
assume that Ui C R"' are simple, compact, convex sets (e.g. hyperbox, Euclidean 
ball, entire space R"', etc). There exist many parallel algorithms in the literature 
for solving the optimization problem p.ip : e.g. Jacobi algorithms [21IE], coordinate 
descent methods |28j, etc. However, the rate of convergence for these algorithms 
is guaranteed under more conservative assumptions than the ones required for the 
parallel coordinate descent method proposed in this section. 

Due to Assumption 13 . 1 1 we have [20, Section 2]: 
(3.2) 

Ck{M + Edi)<Ck{M) + {V,Ck[M),d,) + -^\\d£ VueR", d, eR"', i = l,...,M. 
We introduce the following norm for the extended space R": 

M 

(3.3) |lu||^ = ^L,||u,|^ 

i=l 

which will prove useful for estimating the rate of convergence for our algorithm. Since 
Ck is (T£-strongly convex w.r.t. the Euclidean norm, it is also strongly convex w.r.t. 
with parameter ai < 77-^, where imax = max^^i;. Then, the following 

inequality holds |21) : 

(3.4) /:fc(w)>/:fe(u) + (V/:fc(u),w-u) + ^||w-u||? Vw,ueR" 

and combining it with (|3.2p we can deduce that cti < 1. 

For solving the inner problem p.ip we propose the following parallel coordinate 
descent method, which is similar to the algorithm from |28] . but has much simpler 
iteration: 
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Algorithm (PCD)(u'='0) 




Given u*^'", for 


I > 0: 




For i = 1 , . . . , M compute in parallel 


1. v,^' = 






2. u^'+i = 


1 1 M-1 k,l 
M^i Af • 



From the optimality conditions for v*^'' we get: 



(3.5) 



V,/:fe(u'^'') + L,(v 



fe,/ 



> Vvi e U, 



Taking v.^ = u*^'' in p.Sp and combining with 



and convexity of >Cfc we can con- 



clude that algorithm (PCD) decreases the objective function at each inner iteration 
I: 



Remark 3.2 Note that if the sets U; are simple and Ck has cheap coordinate deriva- 
tives, then computing v*^'' can be done numerically very efficient. For example, in 
case of hyperbox sets, the projection on Ui can be done in O (rii) operations and if 
we also consider Ck to be quadratic, then the cost of computing ViCkiu) is 0(n-ni). 
Moreover, if its Hessian is sparse, then the cost of computing Vi£fc(u) is usually 
much cheaper. Thus, for quadratic problems the worst case complexity per iteration 
of our method is 0{n?). Note that the complexity per iteration of the Jacobi type 



methods from HHHH] is at least 0{r 



'^f^i nf) provided that the local quadratic 
subproblems are solved with an interior point solver. 

The following theorem provides the convergence rate of algorithm (PCD) and em- 
ploys standard techniques for proving convergence of the projected gradient method [5D1 



Theorem 3.3. Let Assumvtion \3.1\ hold and Ch be ai-strongly convex w.r.t. 
Then, the following linear rate of convergence is achieved for algorithm (PCD): 



£k{u'''')-Cl<[l- 



2ai 



Af(l + CTi) 



1.0 
2 " 



Ck{n'''")-Cl 



where r° = ||u'''° — u'^ljf . 

Proof We introduce the following term: rj^ — [[u'''' — u^^H^ = J2i=i (^'^i'^ 

where we recall that u*^ is the optimal solution of p.ip and u*^ = Efu^. Further 
using (j3.5p and similar derivations as in 20^ we can write: 



M 



1 k,l 

<r'-—y(^ 

1=1 ^ 



/-, 1 N k.l 
(1 )U,- ^ 



k,l k.l 
V, — u. 



Vr ( k l \ k,l k.l 



V^Ck{n^^'),nf~u. 



k,l 



By convexity of Ck and p.2p we obtain: 



<r 



2(A.(u 



k,l + l\ 
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If we now take w = u'^ and u = u*-''' in (|3.4p and use the previous inequality we get: 

(3.6) \r\r + A(u'=-'+i) - CI < + Ciu"'') - CI - ^{Ckiu''-') C*, + ^r^). 

From the strong convexity of Ck in p.4p we also get: £fc(u'^'') — C\ + > fTir|^. 
We now define 7 = e [0, 1] and using the previous inequality we obtain: 

£.(u'='Va + Y''L<7 {Ckiu''-')-Cl + ^rl) + (1 - J)a^ri. 
Using this inequality in p.6p we get: 

irf^,+£.(u'=^'+i) -Cl<{l-^) Qrf + A.(u'=^') - . 
Applying this inequality iteratively, we obtain for I > 0: 

lrf+Ck{n'''')-Cl<{l-^)' Qrg + C,{u''^') - , 
and by replacing 7 = we obtain the result. □ 



We can conclude from Theorem 13.31 that the number of inner iterations which has 
to be performed such that stopping criterion (|2.5p holds for an inner accuracy ein is 
given by pT) : 



(3.7) 



ML 



2 

max 1 '-'-^max-'-^u 



In- 



3r 



The output of algorithm (PCD) is u*^ — u'^^'i-. To conclude, we present now the 
following algorithmic framework for solving the original problem (jl.ip : 

Algorithm (Inexact dual (fast) gradient method). 
Initialization: Choose an outer accuracy eout- 
Compute fin and fcout as in Sections 12.21 or [^751 
Choose an initial point A" e M.^. 
Outer loop: For fc = 0, 1, . . . , fcout, perform: 

Step 1. Inner loop: For given A*"', choose u*^'" e U. 
Compute hn as in eq. p.7p . 

For 1 = 0,1,..., lin apply algorithm (PCD) to obtain u*"' = u'''-''". 
Step 2. Compute the approximate gradient Wd{Xk) = h{u''). 
Step 3. Update A'=+^ as in Alg. (IDG) or (A'^+i, A'=) as in Alg. (IDFG). 
Step 4. Update average sequences (u'"', A*"). 
Output: generated approximate primal-dual solutions (u'^, A'''). 



4. Distributed MPC problems for constrained network systems. In this 
section we apply the algorithms (IDG), (IDFG) and (PCD) for solving in a dis- 
tributed fashion MPC problems arising in network systems. 
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4.1. MPC formulation for network systems. We consider discrete-time net- 
work systems, which are usually modelled by a graph whose nodes represents subsys- 
tems and whose arcs indicate dynamic couplings between these subsystems, defined 
by the following linear state equations: 

(4.1) x,{t + l)= J2 ^v^jit) + B^JUJ{t) Vi = 1, . . . ,M, 

where M denotes the number of interconnected subsystems, Xi{t) E K"^* and Ui{t) € 
R""i represent the state and the input of ith subsystem at time t, Aij £ R"^i^"^3 
and Bij S jg"^;^"", ^nd A/"* denotes the neighbors of the ith subsystem including i. 
In a particular case frequently found in literature [T5l[T8l[28] the influence between 
neighboring subsystems is given only in terms of inputs: 

(4.2) x,it + 1) = Aux,{t) + J2 B.,jUj{t). 

We also impose local state and input constraints: 

x^{t)£X„ u,{t)eU, V^ = 1,...,M, i>0, 

where Xi C R"^i and Ui C M""i are simple convex sets. For a prediction horizon of 
length iV, we consider quadratic stage and final costs for each subsystem i: 

where matrices Qi,Pi and Ri are positive definite and ||a;||p = x^Px. 

We now formulate the centralized MPC problem for ()4.ip . for a given initial state 

x: 

M N-1 

(4.3) F*{x)= min ||a;,(t)||^^+||u,(t)||^^ + ||x.(7V)||^^ 
S.t.: Xi{t + 1) = X! ^y^i(^) + BijUj{t), Xi{0) = Xi, 

x^{t)eX„ u,{t)eU^, x,{N)^Xl V^ = 1,...,M, t = 0,...A^-l, 

where Xl are terminal sets chosen under some appropriate conditions to ensure sta- 
bility of the MPC scheme (see e.g. [2^). For the input trajectory of subsystem i and 
the overall input trajectory we use the notations: 

u, = [u,{Qf ...u,{N -if]^ u= [uf ...<,] ^gM". 

We assume in addition that the local constraints sets Ui,Xi and the terminal 
sets xl are polyhedral for all subsystems. An extension to general convex sets is 
straightforward and we omit it here due to space limitations. By eliminating the states 
from the dynamics (|4.ip . problem (|4.3p can be expressed as a large-scale quadratic 
convex optimization problem of the form: 

(4.4) F*{x)^ min ^u^Hu + (Wx w)^u 

uieUi,...,UMeUj;f 2 

s.t.: Gu -t- Ex + g < 0, 
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where H e R"^" is positive definite due to the assumption that all Ri are positive 
definite and the inequalities Gu + Ea; + g < 0, with G £ Rp^", are obtained by 
eliminating the states from the constraints Xi{t) e Xi and Xi{N) e for all i and t. 
If the projection on the input constraints set Ui is difficult, we can also move the input 
constraints in the complicating constraints Gu + Ex + g < 0. In this case — R"' . 
Otherwise, i.e. the set Ui is simple (e.g. hyperbox), the convex set Ui = Htli Ui- 
In MPC, at each time instant, given the initial state x G Xn, where Xn C Y[i=i -^i 
is a region of attraction |26], we need to solve the optimization problem (|4.3p or 
equivalently (|4.4|) . We assume for (|4.4|) that for any x G Xn there exists a "strict 
Slater" vector u, i.e. u g U and Gu + Ex + g < 0. 

In the following sections we discuss how we can solve the MPC problem (|4.4p by 
combining the algorithms (IDG), (IDFG) and (PCD) with tightening constraints 
techniques. We will derive estimates for the number of iterations required for finding 
a suboptimal feasible solution. 

4.2. Tightening the coupling constraints. In many applications, like e.g. 
the MPC problem discussed above, the constraints may represent different require- 
ments on physical limitation of actuators, safety limits and operating conditions of 
the controlled plant. Thus, ensuring the feasibility of the primal variables, i.e. u g U 
and Gu + Ex + g < 0, becomes a prerequisite. However, as we have seen in Sections 
12.21 and 12. 3[ dual methods can ensure these requirements only at optimality, which is 
usually impossible to attain in practice. Therefore, in our approach, instead of solving 
the original problem (|4.4p . we consider a tightened problem (see also 8 for a similar 
approach where the tightened dual problem is solved using a subgradient algorithm 
with very slow convergence rate of order O and approximate solutions for the 

inner problems are computed using the Jacobi algorithm [2]). 

We introduce the following tightened problem associated with the original prob- 
lem 



with u being a strict Slater vector for (|4.4p . Note that for this choice of ec, we 
have that u is also a strict Slater vector for the tightened problem (|4.5p . Similar to 
Section [21 for problem ()4.5p we also denote by C^a the partial Lagrangian w.r.t. the 
complicating constraints Gu + Ex + g + < and by d^^ the corresponding dual 
function. 

In the following sections we will see how we can ensure the feasibility, subop- 
timality and stability of the MPC scheme given in (|4.3p based on the suboptimal 
input u*^""' obtained by solving the tightened problem (|4.5p with the newly developed 
algorithms (IDG)/(IDFG) and (PCD). 

4.3. Feasibility and suboptimality of the MPC scheme. At each time 
instant of the MPC scheme, given the initial state x in the region of attraction Xjv, 
instead of solving the optimization problem (|4.4p we solve the tightened problem (j4.5p 



(4.5) 




where e e denotes the vector with all entries 1 and 



(4.6) 



0<ec<7: min {-(Gu + Ex + g) .}, 

/. — ^ r> 
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using the algorithms (IDG) or (IDFG) for the outer problem and algorithm (PCD) 
for the inner problem. At each step we obtain a suboptimal input u*^""' and according 
to the receding horizon strategy we apply to the system only the first input u*^""* (0) . 
However, we want that the generated control sequence u*-'""* to be suboptimal and 
feasible for the original MPC problem (|4.4p . Thus, we first need to find a relation 
between F* {x) and F*{x). Let us denote by A* an optimal Lagrange multiplier for 
the inequality constraints in (|4.5p . The following upper bound can be established for 
any strict Slater vector u and dual multiplier A e MJ^: 



frm F{x, u) - miuugu F{x, u) 
< 



A, Gu + E.T + g + ece 



F(x, u) — miuugxj u) 



rain {- (Gu + Ex + g + ec)^-} 



A, Gu + Ex + g 



(4.7) 



< 2TZd 



V.T e Xn, 



min {-(Gu + Ex + g)-}-ec 



where in the last inequality we used (j4.6p and the fact that both A and Cc are nonnega- 
tive. Taking into account that {u : Gu + Ex + g + < 0} C {u : Gu + Ex + g < 0} 
we have: 

(4.8) F* (x) > F*{x) Vx e Xn- 

On the other hand, from the dual formulation of the tightened problem ()4.5[) we have: 

F*^{x) ~ min F(x, u) + (A*_^ , Gu + Ex + g + ece) 

(4.9) = min F(x, u) + ( A^ , Gu + Ex + g) + (A^ , ece) 

< maxminF(x, u) + (A, Gu + Ex + g> +^ec||A* || < F*{x) + 2^7^dec■ 

A>0 uGU ^ 



We will further see how we can use relations (14.81) and (|4.9p to recover the primal 
suboptimality for the original problem (|4.4p from the suboptimality of the tightened 
problem (|4.5I) . based on the results from Section [21 We now discuss the suboptimality 
and the feasibility of the MPC scheme based on the algorithms (IDG) and (IDFG). 

For the algorithm (IDG) we assume that the outer accuracy eout is chosen such 
that: 



eout < (Vp + 0-05) 7^d min {- (Gu + Ex + g) ,}. 



Based on the results stated in Section [22] and relations (|4.8p and (|4.9p we can choose, 
for example, the following values for the number of outer iterations fcout, the inner 
accuracy ein and also for the tightening parameter ec: 



(4.10) 



10 (2^ + 0.l)id7^^ 



eout 



eout 



eout 



20 (2v^ + 0.l)' 



(2^ + 0.l)7^d■ 
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Using the previous choices for fcout: Cin and in Theorem 
[Gu'=-+Ex + g + ece]^ < ^ 

fcout + -L 

which implies that for all j = 1, . . . ,p, we can write: 



we have: 



id 



Since (Gu''°"' + Ea; + g + ec)j < [(Gu''°"' + Ea; + g + ec)j] _^ we have that u''°"' e U 
and Gu'^°"» + Ex + g < and thus algorithm (IDG) guarantees feasibility of the 
primal variable u*^""'. Further, using now Theorem 12.61 together with (14.81) and (|4.9p 
we have that < F{x, u*^""') — F*{x) < Cout and since u'^°"t is feasible, we get: 

0<F(x,u'=°-)-i^*(x) <eout 

and thus the MFC scheme based on algorithm (IDG) is also eout-suboptimal. 

In order to prove the suboptimality and feasibility of the MFC scheme based on 
algorithm (IDFG) we proceed in a similar way as for algorithm (IDG). We assume 
that the outer accuracy Cout is chosen such that: 

Cout < (VP + 0.5)7^d min {- (Gu + Ex + g) }. 

Based on the convergence properties of algorithm (IDFG) presented in Section 
and relations (|4.8I) and (|4.9I) we can choose: 



(4.11) 



Eout 




8V2VL2nd (2^+1) 



(2y^+ l)7^d■ 



The eout-suboptimality and feasibility of the MFC scheme based on algorithm (IDFG) 
can be proved now in a similar way as the one for algorithm (IDG) using Theorems 
andHH i-e.: 

< F(x, u'=°"' ) - F* (x) < eout and 



-,fc„ut 



e U, Gu''™' + Ex + g < 0. 



In conclusion, in our MFC scheme from our suboptimal and feasible control sequence 
i^fcout only the first input u'^out (0) is applied to the system according to the receding 
horizon strategy. 

4.4. Stability of the MPC scheme. For stability analysis, we express for the 
entire network system the dynamics, the matrices corresponding to the total stage 
and final costs, and the total terminal set as: x{t + 1) = Ax{t) + Bu{t),Q,R,P 
and X^, respectively. Further, the next state in our MFC scheme is denoted x+ = 
Ax + Bu'=°"' (0) and a new sequence of feasible inputs for the MPC problem at the 

next state x+ is denoted with u+ = (u*^™' (1))'^ . . . (u'=°"'(iV - 1))^ {Kx{N)f , 
where u — Kx is a linear feedback controller. In this section we will make use of the 
following assumptions: 
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Assumption 4.1. (i) The terminal constraint set is positively invariant for 
the closed-loop system x{t + 1) ~ {A + BK)x{t), i.e. for all x £ mt{X^) we have that 
{A + BK)xe int(Xf). 

(ii) The following relation holds: 

(4.12) F(a;+,u+) < F(a;,u''°"') - ||a;||^ ^x e Xn- 

Assumption 14.11 is standard in the the MPC framework (see also [HIES])- Moreover, 
distributed synthesis procedures for finding the matrices K an P for the terminal 
controller and terminal cost such that Assumption 14. II holds can be found e.g. in |16| . 

Based on Assumption 14.11 (i) and the fact that Gu'''°"' + Ea; + g < we can 
immediately see that u+ is a strict Slater vector of the MPC problem (j4.4l) with 
initial state a;+. Therefore, in the MPC problem for the next state x^ we update the 
strict Slater vector as explained above, i.e.: 



u 



u'=-'(l))^... (u'=-'(iV-l))' {Kx{N))' 



and thus u"'" is also feasible for tightened problem 

In order to prove asymptotic stability of the MPC scheme for all x E Xn we use 
similar arguments as in [8l[26] by showing that F{x, u''°"t) is a Lyapunov function: 

F{x+, (u'=-)+) < F*(x+) + 6+ , < Flix+) + 6+ , < Fix+,u+) + 6+ , 



< F(a;,u'=-')-||2^llQ + e. 



Q ' "^out 1 



where e^^^ denotes the outer accuracy for solving MPC problem (|4.5p at initial state 
x'^ . From the previous discussion we have that choosing e.g. 



(4.13) 



Eout < min|^||a;||^, c(p) ^ jnin ^{- (Gu+ + Ej;+ + g)^.}| , 



we get asymptotic stability of the closed-loop system. Here, c{p) — ^ + 0.05 for 
algorithm (IDG) and c{p) = ^ + 0.5 for (IDFG). 

4.5. Distributed implementation. In this section we discuss some technical 
aspects for the distributed implementation of our inexact dual decomposition methods 
in the case of MPC problem (|4.3p and its equivalent form (j4.4[) . 

Usually, for the dynamics (|4.1I) the corresponding matrices H and G obtained 
after eliminating the states are dense and despite the fact that algorithms (IDG), 
(IDFG) and (PCD) can perform parallel computations (i.e. each subsystem needs 
to solve small local problems) we need communication between TV steps neighborhood 
subsystems [4l[8]. However, for the dynamics (j4.2|) the corresponding matrices H and 
G are sparse and in this case in our algorithms (IDG), (IDFG) and (PCD) we can 
perform distributed computations (i.e. the subsystems solve small local problems in 
parallel and they need to communicate only with one neighborhood subsystems as 
detailed below) . Indeed, if the dynamics of the subsystems are given by (|4.2[) , then 
Xi(t) = J^,nXi(^) + SjgA^i ~ ^i^d thus the matrices H and G 

have a sparse structure (see e.g. [4l[28]). In particular, the complicating constraints 
have the following structure: for matrix G the (?,j) block matrices of G, denoted 
Gij, are zero for all j ^ A/"' for a given subsystem i, while the matrix E is block 
diagonal. Further, if we define the neighborhood subsystems of a certain subsystem 
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i as ^ U {I : I e Af^ J G U'}, where J\f' = {j : i e 7V^}, then the 
matrix H has all the block matrices = for all j ^ A/"* and the matrix W has 
all the block matrices — for all j ^ for any given subsystem i. Thus, 
the ith block components ol both Vdg^, and VLe^(u, A) can be computed using only 
local information, i.e. each subsystem i = 1, . . . , M does the following synchronous 
computations: 

(4.14) V,d,^(A) = GyUj + EuXi + g^ + e^e 

(4.15) V.L,,(u, A) = ^ Hyu, + ^ (Wyx, + Gj,A,) + w,. 

Note that in the algorithm (PCD) the only parameters that we need to compute 
are the Lipschitz constants L,;. However, in the MPC problems, Li does not depend 
on the initial state x and can be computed once, ofhine, locally by each subsystem 
i as; Li = Amax(Hii). From the previous discussion it follows immediately that 
each subsystem i performs the inner iterations of algorithm (PCD) in parallel using 
distributed computations (see (14.15^ ) for all x E X^. 

Since the algorithms (IDG) and (IDFG) use only first order information, we 
can observe that once Vidc^(A) has been computed distributively, as proved in (|4.14p . 
all the computations for updating the block component corresponding to subsystem 
i in A'^ or A*^ can be done in parallel due to the fact that we have to do only vector 
operations. However, in these schemes all subsystems need to know the global Lips- 

IIGII^ 

chitz constant Ld = ^ . "^-^ that usually is difficult to be computed distributively. In 

practice, a good upper bound on Ld is sufficient, e.g. Ld < min-^^A^^^-^(fl-) ' '^ti^^^ recall 
that II • II F denotes the Frobenius norm. Note that Ld does not depend on x and can 
be computed offline, before starting the MPC scheme. 

In both algorithms (IDG) and (IDFG), another global constant that has to be 
updated is the upper bound on the norm of the optimal multiplier, TZd- Based on 
the theory developed in the previous sections, after some long but straightforward 
computations an easily computed upper bound for the next T?."}" corresponding to the 
MPC problem with initial state x+ is given by: 

^ ' ' 'I - min {-(Gu+ + Ea;+ + (7),}' 

Note that these upper bounds on Ld and TZ^ can be computed distributively in an 
efficient way. 

From the previous discussion wc can conclude that the sequences A^, Afc and 
Ufc, generated by the algorithms (IDG)/(IDFG) and (PCD) can be computed in 
parallel and distributively provided that good estimates for Ld and TZd are known by 
each subsystem. The effects of the upper bound for TZd on the overall performance of 
the MPC scheme are discussed in Sections 15^ 



5. Numerical tests. In order to certify the efficiency of the proposed algo- 
rithms, we consider different numerical scenarios. We first analyze the behavior of 
algorithms (IDG), (IDFG) and (PCD) on randomly generated QP problems and 
then we compare our algorithms with other QP solvers used in the context of dis- 
tributed MPC. The algorithms were implemented on a PC, with 2 Intel Xeon E5310 
CPUs at 1.60 GHz and 4Gb of RAM. 
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5.1. Practical behavior of newly developed algorithms (IDG), (IDFG) 
and (PCD) . We consider random QP problems of the form: 

(5.1) F* ^ min F(u) (= 0.5u^i/u + w'^u), 

lb<u<ub,Gu+g<0 

where matrices H G M"^" and G S R2nxn g^j-g ^^ken from a normal distribution 
with zero mean and unit variance. Matrix H is then made positive definite by the 
transformation H -s— H^H + In- Further, ub = —lb = 1 and w,g are taken from a 
uniform distribution. For different QP dimensions ranging from n = 100 to n = 1000, 
we first analyze the behavior of algorithms (IDG) and (IDFG) in terms of the 
parameters choice. 

For each n, we consider two different estimates for the number of outer iterations 
depending on the way we compute i?d = where A* is an optimal Lagrange 

multiplier. For algorithm (IDG), A;^^ is the average number of iterations obtained 
using the bound TZd given in (|2.22l) - Section [Z!2l while fcS,t,samp is the average number 
of iterations obtained with = ||A*|j, where A* is computed exactly using Matlab's 
Quadprog, iterations which correspond to 10 random QP problems. We also compute 
the average number of outer iterations fc^^ ^.^^^ observed in practice, obtained by 



imposing the stopping criteria |F(u °"t.'^='>i) — F*\ and 



II to be less 

+ 

than the estimates established in Section for an outer accuracy eout = 10~^. Using 
the results from Section [273l we compute in a similar way k^^, fc^*^^^samp ''^i^i't real for 
algorithm (IDFG). The results for both algorithms are presented in Figure [5TT1 We 




Fig. 5.1. Values of k^^^„ fc2,t_,,^p and fcj^^^^^^j (s = {G;FG}) for algorithms (IDG) (left) 
and (IDFG) (right), eout = 10'^ . 



can observe that in practice algorithm (IDFG) performs much better than algorithm 
(IDG). Note that the expected number of outer iterations A:^t,samp ^^'^ ^out,samp 
obtained from our derived bounds in Sections 12.21 and 1 2.31 offer a good approximation 
for the real number of iterations of the two algorithms. Thus, these simulations show 
that our derived bounds are tight. But, when in our derived estimates we use TZ^, then 
fc^J^ is about one order of magnitude, while A:^,^ is about two orders of magnitude 
greater than the real number of iterations. 

Since the estimates for suboptimality and feasibility violation are also dependent 
on the way the inner accuracy ein is chosen, we are also interested in the behavior of the 
two algorithms w.r.t. ein. For this purpose, we apply algorithms (IDG) and (IDFG) 
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fc, Algorithm IDG t Algorithm IDFG 




Outer iterations Outer iterations 



Fig. 5.2. Suboptimality and feasibility violation for algorithms (IDG) (left) and (IDFG) (right) 
for tout = 10~^ and different 

for solving a random QP problem of dimension n — 300, with a fixed outer accuracy 
Eout — 10"'^ and different values of e-m- In Figure [5?^ we plot the primal suboptimality 
and the feasibility violation by letting the two algorithms perform the number of outer 
iterations computed in Sections 12.21 and 12.31 We can observe from Figure 15.21 that if 
the inner accuracy e-m is chosen too large, the desired level of suboptimality cannot be 
attained. We can also see that algorithm (IDG) is less sensitive to the choice of the 
inner accuracy than algorithm (IDFG) due to the fact that algorithm (IDFG) 
accumulates errors (see Theorems 12.61 and 12. lOp . 

In conclusion, we notice from the results of Sections 12.21 and 12.31 and simulations 
that there is a tradeoff between the speed of convergence and robustness: e.g. algo- 
rithm (IDFG) is faster than algorithm (IDG), but the second one is more robust 
since it does not accumulate the errors. Thus, depending on the application, one can 
choose between the two algorithms. 
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1.27 




800 


2.11 




2702 


19.3 


1274 


9.3 
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1000 


2.69 




2851 


23.05 


1375 


10.1 


4.1 



Table 5.1 



CPU time (seconds) and number of inner iterations for algorithms (PCD) and Jacobi ^28f . 

We also implemented for comparison, algorithms (PCD) and Jacobi from [55]. 
Both algorithms were implemented in C code, with parallelization ensured via MPI. 
Table 15.11 presents the average CPU time in seconds and number of iterations for each 
algorithm for 10 random QP problems (j5.1l) with only box constraints. Since the 
convergence rate for algorithm in [28^ is not known, the stopping criterion for each 
algorithm is F{u'') — F* < 10^^, with F* being precomputed using Quadprog. As we 
can see algorithm (PCD) is about 10 times faster than the algorithm in [55]. 
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5.2. MPC for traffic networks. In this section we analyze the behavior of 
algorithms (IDG) and (IDFG) on MPC problems for traffic network systems. In [S] 
the authors show that traffic network systems can be modeled in the form (|4.2I) . We 
generated ring traffic networks with M even junctions (subsystems) and having M/2 
input links and M/2 output links distributed randomly. In order to work with small 
costs, we normalized the state of the system as: x <— x/10^. For the parameters of 
the system and of the MPC problem see [5] and the references therein. Note that the 
number of states or inputs in this traffic network is 3M/2. 
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2 • 10* 
9.8 • 10"^ 
1.2 • 10"* 


2 • 10* 
5.8 • 10"^ 
5.6 • 10"* 


2 • 10* 
8.2 • 10"^ 
2.4 • 10"-' 



Table 5.2 



Averaged number of iterations and cost decrease for different number of junctions (subsystems) . 

The distributed MPC approach with a prediction horizon of iV = 10 steps was 
applied for solving a single time step of the traffic network with M E {6, 12, 18} 
number of junctions using the newly developed algorithms (IDG) and (IDFG) and 
the dual subgradient algorithm in [8]. For each M the results are shown for a set of 
10 initial states obtained at random. Additionally to the input constraints considered 
in we also assume box constraints on the states. We solve the tightened problem 
(j4.5p . obtained from the MPC problem of form (j4.3|) or equivalently (|4.4p . with an 
outer accuracy Cout = 10~^. In Table \5l2\ we report the average number of outer 
iterations fc^^, /c^^ and fcf^ performed by the algorithms (IDFG), (IDG) and the 
algorithm in [8], respectively. We also count the average real number of iterations 
performed by algorithms (IDFG) and (IDG) by imposing the stopping criterion 
Eout and Gu'^°"' + Ex + g < 0. For the dual subgradient algorithm 
in the stopping criterion was chosen as follows: F{x, u'^""' ) — F* < and 
Q{jfcout _j_ _(_ g < 0, where ef^ and the rest of the parameters for this algorithm 
are computed as in |8l Section III.C]. In all three algorithms the inner problems were 
solved with algorithm (PCD). From Table [5T2] we observe that algorithm (IDFG) 
has the best behavior compared to (IDG) and the dual subgradient algorithm in [5]. 
Thus, algorithm (IDFG) is superior in terms of both, predicted (theoretical) and real 
number of iterations (e.g. from 10 to 100 times faster than (IDG)). Further, algorithm 
(IDG) is able to produce a feasible and suboptimal solution in a reasonable number 
of outer iterations, while the dual subgradient algorithm in [8] failed to generate a 
feasible solution within 2 • 10^ outer iterations. We observed that this behavior is due 
mainly to the fact that the step size in (IDG) is larger than that in [8]. 

6. Conclusions. Motivated by MPC problems for complex interconnected sys- 
tems, we have proposed two dual based methods for solving large-scale smooth convex 
optimization problems with coupling constraints. We moved the coupling constraints 
into the cost using duality theory. We solved the inner subproblems only up to a 
certain accuracy by means of a parallel coordinate descent method for which we have 
proved linear convergence. For solving the outer problems, we developed inexact dual 
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gradient and fast gradient schemes for which we provide a full convergence analy- 
sis, deriving upper bounds on dual and primal suboptimality and primal feasibility 
violation. We also discussed some implementation issues of the new algorithms for 
distributed MPC problems and tested them on several practical applications. 
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Appendix. Proof of Lemma [K7\ 
Case 1 - We first consider the unconstrained case, i.e. U — M". Since F is strongly 
convex, it follows that u(A) is unique and thus d is a differentiable function having 
the gradient: 

Vd{\) = Wu{XfWF{u{X)) + h{u{X)) + Wu{XfWh{u{X)fX 

= Vu(A)^ [VF(u(A)) + Wh{u{X)fX] + h{u{X)) = h{u{X)), 

where the last equality is obtained using the optimality conditions for u(A), i.e.: 

(6.1) Vi^(u(A)) + V/i(u(A))^A = VA>0. 

Taking now into account that the components of h are twice differentiable we 
have: 

(6.2) v2d(A) = V/i(u(A))Vu(A). 

Differentiating now the optimality conditions (|6.ip w.r.t. to A we can write: 

p 

Vu{XfV^F{u{X)) + Vh{u{X)) + Vu{Xf J2 A,:V2/i,(u(A)) = 0, 

1=1 

from which we obtain: 



Vu(A)^ 



-V/i(u(A)) 



V2f(u(A))+£a,V2/i,(u(A)) 



Introducing this relation into (|6.2p and taking into account that ^i'^'^ f^ii^W) 'tl 

we have: 



-V^d{X) = Vh{u{X)) 



V^F{u{X)) +j2^^^^h,{u{X)) 



V/i(u(A))^ 



-< 



Vh{u{X))[S/^F{u{X))] 'v/i(u(A))^ 



Since F is crp-strongly convex and thus V^i^(u(A)) >: a-pln and the Jacobian of h is 
bounded (see Assumption (|l.ip ). we can write further: 

||VMA)||<|| [v2i^(u(A))]-'||.||VMu(A))|p<|| [vMu(A))]-^||.||VMu(A))|||<^. 

(Tf 



Thus, we can conclude using Lemma 1.2.2 from JTP that Ld 
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Case 2 - We assume now that U is a compact convex set. Since F is strongly 
convex, the dual function d is still differentiable and given by V(i(A) = h(u{X)). In 
order to show Lipschitz continuity of the gradient, we consider the following family 
of dual functions {dr)T>o- 

(6.3) dr{X) = min F{u) + (A, h{u)) + t6u(u), 

where bjj is a self-concordant barrier function for the set U. Let m(A, r) be the op- 
timal solution of (|6.3p . Using the same reasoning as in the unconstrained case and 
taking into account that V^6u(u) ^ (see [HJ Section 4.2.2]), we have that for any 
given T > the gradient WdT-{X) = /i(u(A,t)) is Lipschitz continuous with constant 

= £i, i.e. \\h{u{\,T)) - h{u{iy,T))\\ < ^\\\ - for ah X,iy > 0. Since for aU 
A > we have (ir(A) — )■ d(A), w(A, r) — u{X) as r — > -1-0 and his a continuous function 
we can conclude that the gradient of the dual function d is also Lipschitz continuous 
with constant = —. □ 
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